In this doc I have all the code to do the analysis we spoke about yesterday, and to make a couple of plots. Feel free to browse through as this might be helpful if you weren’t 100% sure what was going on yesterday. Click on the button that says code to show the R code. You may use one of the plots (they show the same info so pick one) in your report, or you can make a different one that you like better!
After collecting data I like to document my analysis in a report-style document. I do this for my own benefit and to improve the reproducibility and transparency of my science. It allows me to quickly look up code I’ve used previously, with a description of how or why I used it. It also pulls the sheet off of the analysis process, which is often only vaguely described in papers. The report accompanies the manuscript, and recently I’ve been submitting it as part of the supplementary material.
I use the R markdown package you see here to write these reports. R markdown operates out of R studio, but with what I consider to be some major improvements. First of all, the document is split into two parts: code chunks and writing space. R code is written in the chunks, and plain text (you don’t need hashes!) anywhere else. Some consider R markdown to be a better writing program than word. If any of you are interested in doing a Masters or Honours (or any form of data analysis down the line), I think it’s well worth taking the time to have a look at.
To open an R markdown file go to File -> New File -> R markdown
The functional power of R can be substantially improved by installing and loading packages. In the code chunk below we load in some packages to make coding easier, and some that allow us to fit linear models.
library(tidyverse) # for tidy coding and ggplot2
library(lme4) # for the lmer and glmer mixed model functions
library(lmerTest) # Used to get p-values for lmer models using simulation. It over-writes lmer() with a new version
library(glmmTMB) # for faster glms
library(kableExtra) # for scrolling tables
library(DT) # for even better tables
library(car) # for the good Anova function - gives us type II and III which are needed for unbalanced designs
data <- read_csv("Water_strider_Data.csv")
# Create a function to build HTML searchable tables - you can copy this and it will work!
my_data_table <- function(df){
datatable(
df, rownames=FALSE,
autoHideNavigation = TRUE,
extensions = c("Scroller", "Buttons"),
options = list(
dom = 'Bfrtip',
deferRender=TRUE,
scrollX=TRUE, scrollY=400,
scrollCollapse=TRUE,
buttons =
list('pageLength', 'colvis', 'csv', list(
extend = 'pdf',
pageSize = 'A4',
orientation = 'landscape',
filename = 'water_strdier_data')),
pageLength = 90
)
)
}
# now use the function we made above, all you need to do is put your dataframes name in
my_data_table(data)
First lets make sure our variables are coded correctly
We do this because we don’t want R to interpret our data incorrectly. For example, if we didn’t change anything, tank is coded as a continuous numeric variable. Tanks are relatively indepedent of one another, and Tank 10 is no larger than Tank 1 which is currently assumed as a continuous variable. We change this to a factor, which makes this a categorical variable.
data <- data %>%
mutate(id = as.factor(id),
tank = as.factor(tank),
bias = as.factor(bias))
# reorder the sex ratio treatments so that they flow logically
data$bias <- fct_relevel(data$bias, "E", after = 1)
Writing the model
Lets look at how sex ratio and the colour we painted females affected the number of mating attempts. We predicted that as the sex ratio becomes male biased, females will be subject to more mating attempts from males. If this is the case, we expect that this is costly for females and that females will mate more frequently under male-biased conditions to avoid these costs. Unfortunately we did not observe enough matings to conduct a meaninful analysis.
It is plausible that paint colour may affect female behaviour, which could affect the number of times males attempt to mate with that female.
I don’t have this in my dataset, but you can also add the design change variable (where some water striders were painted 10 mins before the trial and others at least 24 hours before) as anothe rpredictor variable. Together with paint colour, including these as predictors allows us to test whether sex ratio affected number of mating attempts per female after accoutning for paint colour and the design change.
mating_attempts_model <- glmmTMB(mating_attempts ~ bias + colour,
family = gaussian,
data = data)
To have a look at the results, you can use the Anova function (from the car package)
Anova(mating_attempts_model, type = "II")
## Analysis of Deviance Table (Type II Wald chisquare tests)
##
## Response: mating_attempts
## Chisq Df Pr(>Chisq)
## bias 11.9697 2 0.002517 **
## colour 3.4133 4 0.491183
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anovaresults <- aov(mating_attempts_model)
There are 3 different types and I always forget which to use. So I generally google until I find this. In our case, type II will get the job done, because we don’t have an interaction in our model.
Or you can use the summary() function.
summary(mating_attempts_model)
## Family: gaussian ( identity )
## Formula: mating_attempts ~ bias + colour
## Data: data
##
## AIC BIC logLik deviance df.resid
## 499.0 518.6 -241.5 483.0 77
##
##
## Dispersion estimate for gaussian family (sigma^2): 17.2
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.5902 1.0577 0.558 0.57684
## biasE 2.9294 0.9916 2.954 0.00314 **
## biasM 3.6724 1.4411 2.548 0.01082 *
## colourmetallic -1.4916 1.5080 -0.989 0.32262
## colourred -0.1518 1.3625 -0.111 0.91132
## colourwhite 1.0656 1.3730 0.776 0.43767
## colouryellow -0.9671 1.3725 -0.705 0.48103
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Note how summary() provides a different output to Anova(). This is because Anova() is looking for a general effect of Sex_ratio. You can’t tell which groups differ from one another. With summary() it compares each group to a reference level (defaults to the first treatment alphabetically). This means you get an estimate of the difference between each group with the reference, plus an associated P-value.
summary() works in a similar way to the Tukey’s test. However, the tukey’s test output looks a little different. It provides an estimate of the difference between each Sex ratio treatment (called diff), with confidence intervals (lwr and upr) and a P value (called p adj)
TukeyHSD(anovaresults)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = mating_attempts_model)
##
## $bias
## diff lwr upr p adj
## E-F 3.1500000 0.7011120 5.598888 0.0081149
## M-F 3.8409091 0.3543366 7.327482 0.0273168
## M-E 0.6909091 -2.9547411 4.336559 0.8933060
##
## $colour
## diff lwr upr p adj
## metallic-blue -1.4485490 -5.755380 2.858282 0.8806674
## red-blue -0.1482576 -4.075865 3.779350 0.9999715
## white-blue 1.0658690 -2.922074 5.053812 0.9447518
## yellow-blue -0.9477674 -4.935710 3.040175 0.9635206
## red-metallic 1.3002914 -3.099787 5.700370 0.9220409
## white-metallic 2.5144179 -1.939600 6.968436 0.5167820
## yellow-metallic 0.5007816 -3.953237 4.954800 0.9978477
## white-red 1.2141266 -2.874343 5.302597 0.9207463
## yellow-red -0.7995098 -4.887980 3.288960 0.9821108
## yellow-white -2.0136364 -6.160102 2.132829 0.6571704
So that’s the differences sorted. If we just want to look at the means for mating attempts per female in each treatment we can predict them from the original model.
# For prediction lets drop the colour variable from our model to make things simpler
mating_attempts_model_reduced <- glmmTMB(mating_attempts ~ bias,
family = gaussian,
data = data)
new_data <- data %>%
select(bias) %>%
distinct()
predict_means <- predict(mating_attempts_model_reduced, newdata = new_data, type = "response", se.fit = TRUE, re.form = ~0) %>%
unlist() %>%
as_tibble()
pred_1a <- predict_means %>%
slice(1:3) %>%
rename(mean_estimate = "value")
pred_1b <- predict_means %>%
slice(4:6) %>%
rename(SE = "value")
predict_means <- cbind(new_data, pred_1a, pred_1b) %>%
rename(mating_attempts = mean_estimate) %>%
mutate(Lower95CI = mating_attempts - (SE * 1.96),
Upper95CI = mating_attempts + (SE * 1.96))
predict_means_clean <-
predict_means %>%
mutate(bias = fct_recode(predict_means$bias, `Female biased` = "F", Equal = "E", `Male biased` = "M")) %>%
rename(`Sex ratio treatment` = bias)
kable(predict_means_clean) %>%
kable_styling()
| Sex ratio treatment | mating_attempts | SE | Lower95CI | Upper95CI |
|---|---|---|---|---|
| Female biased | 0.2500011 | 0.6375838 | -0.9996632 | 1.499665 |
| Male biased | 4.0909038 | 1.2751676 | 1.5915753 | 6.590232 |
| Equal | 3.3999985 | 0.7721523 | 1.8865799 | 4.913417 |
Note that the female biased treatment has a negative lower CI. This is a bug because it’s impossible to have negative mating attempts. Replace the lower limit with 0.
predict_means[1, 4] = 0
predict_means_clean[1, 4] = 0
There are two main ways to build plots in R: using the base R function plot() or using the tidyverse function ggplot(). I personally use ggplot so that’s what we’ll do here.
Lets make a barplot and a point range plot, with the sex ratio treatment on the x axis and the number of mating attempts on the y axis.
ggplot(predict_means_clean) +
geom_bar(aes(x=`Sex ratio treatment`, y=mating_attempts), stat="identity", fill="skyblue", alpha=0.7) +
#geom_pointrange(aes(x=`Sex ratio treatment`, y=mating_attempts, ymin=Lower95CI, ymax=Upper95CI), colour="orange", alpha=1, size=1.3) +
geom_errorbar( aes(x=`Sex ratio treatment`, ymin=Lower95CI, ymax=Upper95CI), width=0.1, colour="orange", alpha=0.9, size=1) +
labs(x = "Sex ratio treatment", y = "Mean number of mating attempts per female\n (95% CIs)") +
theme_bw() +
theme(legend.position = "none",
panel.grid.major.x = element_blank(),
text = element_text(size=16))
We can also alternatively plot this like so, which has the advantage of showing the raw data (in grey), with the mean and 95% CIs in orange. But the raw data makes the pattern less clear, so the choice is yours (or make something completely different!).
# ggplot build plots in layers, after each layer you can add another by using the + symbol
data <- data %>%
mutate(bias = fct_recode(data$bias, `Female biased` = "F", Equal = "E", `Male biased` = "M")) %>%
rename(`Sex ratio treatment` = bias)
ggplot(data = data) +
geom_jitter(data = data, aes(x = `Sex ratio treatment`, y = mating_attempts, fill = `Sex ratio treatment`), width = 0.1, size = 5, alpha = 0.5) +
geom_errorbar(data = predict_means_clean, aes(x = `Sex ratio treatment`, ymax = Upper95CI, ymin = Lower95CI),
colour = "orange", fill = "black", width = 0, size = 1.2) +
geom_point(data = predict_means_clean, aes(x = `Sex ratio treatment`, y = mating_attempts),
size = 7, shape = 21, fill = "orange", colour = "black") +
labs(x = "Sex ratio treatment", y = "Number of mating attempts per female\n (95% CIs)") +
theme_bw() +
theme(legend.position = "none",
panel.grid.major.x = element_blank(),
text = element_text(size=16))